matrix factorization
- North America > United States > Wisconsin > Dane County > Madison (0.14)
- North America > United States > California > Riverside County > Riverside (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.70)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > California (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > France > Grand Est > Bas-Rhin > Strasbourg (0.04)
- Health & Medicine (0.46)
- Government (0.46)
- North America > United States > Florida > Alachua County > Gainesville (0.14)
- Asia > Middle East > Israel (0.04)
- Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Florida > Alachua County > Gainesville (0.14)
- Asia > Middle East > Israel (0.04)
- Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
5927edd18c5dd83aa8936a4610c72029-Supplemental-Conference.pdf
In this section, we examine our theoretical results with controlled experiments via synthetic data. We do not have a complete explanation for such spikes. At first glance, overfitting could happen when the number of linear measurements is less than the size of the groundtruth matrix. Moreover, when the measurements satisfy RIP, Li et al. Soltanolkotabi [ 45 ] show that GD exactly recovers the ground truth. To our best knowledge, most existing generalization analysis for flat regularization are for two-layer models, e.g., Li et al.
Preconditioning Benefits of Spectral Orthogonalization in Muon
Ma, Jianhao, Huang, Yu, Chi, Yuejie, Chen, Yuxin
The Muon optimizer, a matrix-structured algorithm that leverages spectral orthogonalization of gradients, is a milestone in the pretraining of large language models. However, the underlying mechanisms of Muon -- particularly the role of gradient orthogonalization -- remain poorly understood, with very few works providing end-to-end analyses that rigorously explain its advantages in concrete applications. We take a step by studying the effectiveness of a simplified variant of Muon through two case studies: matrix factorization, and in-context learning of linear transformers. For both problems, we prove that simplified Muon converges linearly with iteration complexities independent of the relevant condition number, provably outperforming gradient descent and Adam. Our analysis reveals that the Muon dynamics decouple into a collection of independent scalar sequences in the spectral domain, each exhibiting similar convergence behavior. Our theory formalizes the preconditioning effect induced by spectral orthogonalization, offering insight into Muon's effectiveness in these matrix optimization problems and potentially beyond.
- Asia > Middle East > Jordan (0.04)
- North America > United States > Pennsylvania (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)
Matrix Factorization Framework for Community Detection under the Degree-Corrected Block Model
Dache, Alexandra, Vandaele, Arnaud, Gillis, Nicolas
Community detection is a fundamental task in data analysis. Block models form a standard approach to partition nodes according to a graph model, facilitating the analysis and interpretation of the network structure. By grouping nodes with similar connection patterns, they enable the identification of a wide variety of underlying structures. The degree-corrected block model (DCBM) is an established model that accounts for the heterogeneity of node degrees. However, existing inference methods for the DCBM are heuristics that are highly sensitive to initialization, typically done randomly. In this work, we show that DCBM inference can be reformulated as a constrained nonnegative matrix factorization problem. Leveraging this insight, we propose a novel method for community detection and a theoretically well-grounded initialization strategy that provides an initial estimate of communities for inference algorithms. Our approach is agnostic to any specific network structure and applies to graphs with any structure representable by a DCBM, not only assortative ones. Experiments on synthetic and real benchmark networks show that our method detects communities comparable to those found by DCBM inference, while scaling linearly with the number of edges and communities; for instance, it processes a graph with 100,000 nodes and 2,000,000 edges in approximately 4 minutes. Moreover, the proposed initialization strategy significantly improves solution quality and reduces the number of iterations required by all tested inference algorithms. Overall, this work provides a scalable and robust framework for community detection and highlights the benefits of a matrix-factorization perspective for the DCBM.
- Europe > United Kingdom > Scotland (0.04)
- North America > United States > South Carolina (0.04)
- Europe > Belgium (0.04)
A review of NMF, PLSA, LBA, EMA, and LCA with a focus on the identifiability issue
Qi, Qianqian, van der Heijden, Peter G. M.
Across fields such as machine learning, social science, geography, considerable attention has been given to models that factorize a nonnegative matrix into the product of two or three matrices, subject to nonnegative or row-sum-to-1 constraints. Although these models are to a large extend similar or even equivalent, they are presented under different names, and their similarity is not well known. This paper highlights similarities among five popular models, latent budget analysis (LBA), latent class analysis (LCA), end-member analysis (EMA), probabilistic latent semantic analysis (PLSA), and nonnegative matrix factorization (NMF). We focus on an essential issue-identifiability-of these models and prove that the solution of LBA, EMA, LCA, PLSA is unique if and only if the solution of NMF is unique. We also provide a brief review for algorithms of these models. We illustrate the models with a time budget dataset from social science, and end the paper with a discussion of closely related models such as archetypal analysis.
- Asia > Middle East > Jordan (0.04)
- Asia > Singapore (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (9 more...)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
Implicit Regularization in Deep Matrix Factorization
Efforts to understand the generalization mystery in deep learning have led to the belief that gradient-based optimization induces a form of implicit regularization, a bias towards models of low complexity. We study the implicit regularization of gradient descent over deep linear neural networks for matrix completion and sensing, a model referred to as deep matrix factorization. Our first finding, supported by theory and experiments, is that adding depth to a matrix factorization enhances an implicit tendency towards low-rank solutions, oftentimes leading to more accurate recovery. Secondly, we present theoretical and empirical arguments questioning a nascent view by which implicit regularization in matrix factorization can be captured using simple mathematical norms. Our results point to the possibility that the language of standard regularizers may not be rich enough to fully encompass the implicit regularization brought forth by gradient-based optimization.
Beyond Alternating Updates for Matrix Factorization with Inertial Bregman Proximal Gradient Algorithms
Matrix Factorization is a popular non-convex optimization problem, for which alternating minimization schemes are mostly used. They usually suffer from the major drawback that the solution is biased towards one of the optimization variables. A remedy is non-alternating schemes. However, due to a lack of Lipschitz continuity of the gradient in matrix factorization problems, convergence cannot be guaranteed. A recently developed approach relies on the concept of Bregman distances, which generalizes the standard Euclidean distance. We exploit this theory by proposing a novel Bregman distance for matrix factorization problems, which, at the same time, allows for simple/closed form update steps. Therefore, for non-alternating schemes, such as the recently introduced Bregman Proximal Gradient (BPG) method and an inertial variant Convex--Concave Inertial BPG (CoCaIn BPG), convergence of the whole sequence to a stationary point is proved for Matrix Factorization. In several experiments, we observe a superior performance of our non-alternating schemes in terms of speed and objective value at the limit point.